{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Signal Subgraph Estimators"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### This tutorial will introduce the following signal-subgraph estimators:\n",
    "- Incoherent subgraph estimator\n",
    "- Coherent subgraph estimator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import graspologic.subgraph as sg\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Preliminaries\n",
    "\n",
    "The general graph model is characterized by $M_V(m,s;\\pi,p,q)$, where: \n",
    "\n",
    "$V$ - the number of vertices in the graph\n",
    "\n",
    "$n$ - the number of graph samples\n",
    "\n",
    "$s$ - the number of edges that must be present in the subgraph\n",
    "\n",
    "$m$ - the number of vertices that each edge in the subgraph must be incident to\n",
    "\n",
    "$\\pi$ - the probability of a graph sample being of class 1\n",
    "\n",
    "$p$ - the probability of edges in the signal-subgraph, conditioned on Class 0\n",
    "\n",
    "$q$ - the probability of edges in the signal-subgraph, conditioned on Class 1\n",
    "\n",
    "The signal-subgraph is a subset of edges with distinct class-conditional likelihood parameters. The signal-subgraph estimator evaluates a test statistic for each edge in the graph and selects $s$ edges with the lowest test statistic, where $s$ is the desired size of the signal-subgraph. \n",
    "\n",
    "The estimator that is used to find the signal-subgraph determines certain properties of the resulting subgraph. Both estimators use $s$ to determine the size of the resulting subgraph. $m$ is only used for the coherent estimator, which constrains the subgraph to $m$ vertices. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Incoherent Signal-Subgraph Estimator\n",
    "\n",
    "For this example we will randomly select 20 edges from a graph with 70 vertices. These edges will have distinct class-conditional edge probabilities, and the graphs will be sampled from the model $M_{70}(20; 0.5, 0.8, 0.1)$, with $n = 100$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from graspologic.plot import heatmap\n",
    "\n",
    "verts = 70\n",
    "sedges = 20\n",
    "pi = 0.5\n",
    "p = 0.8\n",
    "q = 0.1\n",
    "nsamples = 100\n",
    "\n",
    "np.random.seed(8888)\n",
    "classlabels = np.zeros(nsamples, dtype=int)\n",
    "classlabels[1::2] = 1\n",
    "\n",
    "sigsubindex = np.random.choice(verts ** 2, sedges, replace=False)\n",
    "vect = p * np.ones(verts ** 2)\n",
    "vect[sigsubindex] = q\n",
    "vect = np.reshape(vect, (verts, verts))\n",
    "expected = np.where(vect == q, 1, 0)\n",
    "\n",
    "blank = vect[:, :, None] + np.zeros(int(nsamples / 2))\n",
    "A = p * np.ones((verts, verts, nsamples))\n",
    "A[:, :, 1::2] = blank\n",
    "A = np.random.binomial(1, A)\n",
    "\n",
    "sigsub = sg.SignalSubgraph()\n",
    "sigsub.fit_transform(graphs=A, labels=classlabels, constraints=sedges)\n",
    "\n",
    "estimatesigsub = np.zeros((verts, verts))\n",
    "estimatesigsub[sigsub.sigsub_] = 1\n",
    "\n",
    "fig, ax = plt.subplots(ncols=2, figsize=(10, 5), constrained_layout=True)\n",
    "heatmap(expected, ax=ax[0], cbar=False, title=\"Expected Signal-Subgraph\")\n",
    "_ = heatmap(estimatesigsub, ax=ax[1], cbar=False, title=\"Estimated Signal-Subgraph\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that because $p$ and $q$ are sufficiently distinct and $n$ is sufficiently large, the Expected and Estimated signal-subgraphs should match exactly. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Incoherent Signal-Subgraph Estimator\n",
    "\n",
    "Once again, we will randomly select 20 edges from a graph with 70 vertices. These edges will have distinct class-conditional edge probabilities, but the graphs will be sampled from the model $M_{70}(1, 20; 0.5, 0.8, 0.1)$, with $n = 100$.\n",
    "\n",
    "The estimated signal-subgraph will have 20 edges, constrained so that each edge must be incident to the same vertex. First, we will use the same expected signal-subgraph as the previous example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(8888)\n",
    "classlabels = np.zeros(nsamples, dtype=int)\n",
    "classlabels[1::2] = 1\n",
    "\n",
    "sigsubindex = np.random.choice(verts ** 2, sedges, replace=False)\n",
    "vect = p * np.ones(verts ** 2)\n",
    "vect[sigsubindex] = q\n",
    "vect = np.reshape(vect, (verts, verts))\n",
    "expected = np.where(vect == q, 1, 0)\n",
    "\n",
    "blank = vect[:, :, None] + np.zeros(int(nsamples / 2))\n",
    "A = p * np.ones((verts, verts, nsamples))\n",
    "A[:, :, 1::2] = blank\n",
    "A = np.random.binomial(1, A)\n",
    "\n",
    "sigsub = sg.SignalSubgraph()\n",
    "sigsub.fit_transform(A, classlabels, [20, 1])\n",
    "\n",
    "estimatesigsub = np.zeros((verts, verts))\n",
    "estimatesigsub[sigsub.sigsub_] = 1\n",
    "\n",
    "fig, ax = plt.subplots(ncols=2, figsize=(10, 5), constrained_layout=True)\n",
    "heatmap(expected, ax=ax[0], cbar=False, title=\"Expected Signal-Subgraph\")\n",
    "_ = heatmap(estimatesigsub, ax=ax[1], cbar=False, title=\"Estimated Signal-Subgraph\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice how the coherent estimator constrains the estimated signal-subgraph to 20 edges that are incident to 1 vertex with the best total significance values. Now, we will try an expected signal-subgraph that is also limited to one vertex."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mverts = 1\n",
    "\n",
    "np.random.seed(7777)\n",
    "classlabels = np.zeros(nsamples, dtype=int)\n",
    "classlabels[1::2] = 1\n",
    "\n",
    "m = np.random.choice(verts, mverts)\n",
    "vect = p * np.ones(2 * verts * mverts - (mverts ** 2))\n",
    "vect[np.random.choice(len(vect), sedges, replace=False)] = q\n",
    "\n",
    "blank = p * np.ones((verts, verts))\n",
    "blank[m, :] = np.nan\n",
    "blank[:, m] = np.nan\n",
    "blank[np.isnan(blank)] = vect\n",
    "expected = np.where(blank == q, 1, 0)\n",
    "\n",
    "blank = blank[:, :, None] + np.zeros(int(nsamples / 2))\n",
    "A = p * np.ones((verts, verts, nsamples))\n",
    "A[:, :, 1::2] = blank\n",
    "A = np.random.binomial(1, A)\n",
    "\n",
    "sigsub = sg.SignalSubgraph()\n",
    "sigsub.fit_transform(graphs=A, labels=classlabels, constraints=sedges)\n",
    "\n",
    "estimatesigsub = np.zeros((verts, verts))\n",
    "estimatesigsub[sigsub.sigsub_] = 1\n",
    "\n",
    "fig, ax = plt.subplots(ncols=2, figsize=(10, 5), constrained_layout=True)\n",
    "heatmap(expected, ax=ax[0], cbar=False, title=\"Expected Signal-Subgraph\")\n",
    "_ = heatmap(estimatesigsub, ax=ax[1], cbar=False, title=\"Estimated Signal-Subgraph\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that the expected signal-subgraph is constrained to the coherent signal-subgraph model, the Expected and Estimated signal-subgraphs are exactly equal. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
